Data Analysis

Comprehensive Data Cleaning & Exploratory Data Analysis of Job Market Trends

Authors
Affiliation

Connor Coulter

Boston University

Wei Wang

Boston University

Balqis Bevi Abdul Hannan Kanaga

Boston University

1 Import Data

Loaded dataset: (72498, 131)
ID LAST_UPDATED_DATE LAST_UPDATED_TIMESTAMP DUPLICATES POSTED EXPIRED DURATION SOURCE_TYPES SOURCES URL ... NAICS_2022_2 NAICS_2022_2_NAME NAICS_2022_3 NAICS_2022_3_NAME NAICS_2022_4 NAICS_2022_4_NAME NAICS_2022_5 NAICS_2022_5_NAME NAICS_2022_6 NAICS_2022_6_NAME
0 1f57d95acf4dc67ed2819eb12f049f6a5c11782c 9/6/2024 2024-09-06 20:32:57.352 Z 0.0 6/2/2024 6/8/2024 6.0 [\n "Company"\n] [\n "brassring.com"\n] [\n "https://sjobs.brassring.com/TGnewUI/Sear... ... 44.0 Retail Trade 441.0 Motor Vehicle and Parts Dealers 4413.0 Automotive Parts, Accessories, and Tire Retailers 44133.0 Automotive Parts and Accessories Retailers 441330.0 Automotive Parts and Accessories Retailers
1 0cb072af26757b6c4ea9464472a50a443af681ac 8/2/2024 2024-08-02 17:08:58.838 Z 0.0 6/2/2024 8/1/2024 NaN [\n "Job Board"\n] [\n "maine.gov"\n] [\n "https://joblink.maine.gov/jobs/1085740"\n] ... 56.0 Administrative and Support and Waste Managemen... 561.0 Administrative and Support Services 5613.0 Employment Services 56132.0 Temporary Help Services 561320.0 Temporary Help Services
2 85318b12b3331fa490d32ad014379df01855c557 9/6/2024 2024-09-06 20:32:57.352 Z 1.0 6/2/2024 7/7/2024 35.0 [\n "Job Board"\n] [\n "dejobs.org"\n] [\n "https://dejobs.org/dallas-tx/data-analys... ... 52.0 Finance and Insurance 524.0 Insurance Carriers and Related Activities 5242.0 Agencies, Brokerages, and Other Insurance Rela... 52429.0 Other Insurance Related Activities 524291.0 Claims Adjusting
3 1b5c3941e54a1889ef4f8ae55b401a550708a310 9/6/2024 2024-09-06 20:32:57.352 Z 1.0 6/2/2024 7/20/2024 48.0 [\n "Job Board"\n] [\n "disabledperson.com",\n "dejobs.org"\n] [\n "https://www.disabledperson.com/jobs/5948... ... 52.0 Finance and Insurance 522.0 Credit Intermediation and Related Activities 5221.0 Depository Credit Intermediation 52211.0 Commercial Banking 522110.0 Commercial Banking
4 cb5ca25f02bdf25c13edfede7931508bfd9e858f 6/19/2024 2024-06-19 07:00:00.000 Z 0.0 6/2/2024 6/17/2024 15.0 [\n "FreeJobBoard"\n] [\n "craigslist.org"\n] [\n "https://modesto.craigslist.org/sls/77475... ... 99.0 Unclassified Industry 999.0 Unclassified Industry 9999.0 Unclassified Industry 99999.0 Unclassified Industry 999999.0 Unclassified Industry

5 rows × 131 columns

2 Data Cleaning & Preprocessing

2.1 Create Derived Columns (before any dropping)

Derived non-null: {'INDUSTRY_DISPLAY': 72454, 'SALARY_DISPLAY': 30808}

2.2 Drop Unnecessary Columns

Remaining columns (first 30): ['LAST_UPDATED_DATE', 'POSTED', 'EXPIRED', 'DURATION', 'MODELED_EXPIRED', 'MODELED_DURATION', 'COMPANY', 'COMPANY_NAME', 'COMPANY_IS_STAFFING', 'EDUCATION_LEVELS', 'EDUCATION_LEVELS_NAME', 'MIN_EDULEVELS', 'MIN_EDULEVELS_NAME', 'MAX_EDULEVELS', 'MAX_EDULEVELS_NAME', 'EMPLOYMENT_TYPE', 'EMPLOYMENT_TYPE_NAME', 'MIN_YEARS_EXPERIENCE', 'MAX_YEARS_EXPERIENCE', 'IS_INTERNSHIP', 'SALARY', 'REMOTE_TYPE', 'REMOTE_TYPE_NAME', 'ORIGINAL_PAY_PERIOD', 'SALARY_TO', 'SALARY_FROM', 'LOCATION', 'CITY', 'CITY_NAME', 'COUNTY']

2.3 Handle Missing Values

2.4 Remove Duplicates

Removed 3300 duplicates using ['TITLE', 'COMPANY_NAME', 'LOCATION', 'POSTED']

3 Exploratory Data Analysis (EDA)

Top 15 industries selected: ['All Other Professional, Scientific, and Technical Services', 'Other Management Consulting Services', 'Other Scientific and Technical Consulting Services', 'General Medical and Surgical Hospitals', 'Colleges, Universities, and Professional Schools'] ...

3.1 Job Postings by Industry (Top 15)

3.2 Salary Distribution by Industry (Top 15 only)

3.3 Remote vs. On-Site Jobs

3.4 EDA: Rationale & Insights

3.4.1 Job Postings by Industry

Why: Job Postings by Industry highlights sectors where demand is concentrated, giving visibility into the job market and showing which industries are actively hiring.
Key Insights: The top three industries by job postings are Temporary Help Services, Miscellaneous Ambulatory Health Care Services, and Semiconductor and Related Device Manufacturing. This shows where demand is highest and where job seekers can focus if they are looking for opportunities.

3.4.2 Salary Distribution by Industry

Why: Salary distribution by industry helps people see where there are opportunities for negotiation and ensures they don’t get underpaid. It also helps job seekers focus on industries that pay well based on current market trends.
Key Insights: Automotive Parts and Accessories Retailers show a wide salary range, meaning there are jobs at different pay levels and more room for negotiation. On the other hand, Barber Shops have a very narrow salary range, suggesting little to no negotiation power.

3.4.3 Remote vs. On-Site Jobs

Why: Remote vs. On-Site Jobs is important because workplace flexibility is a major factor in today’s job market. It shows job seekers which roles match their lifestyle preferences and helps employers see how their flexibility compares to the market.
Key Insights: Most postings (78.3%) don’t state if they are remote or on-site, making it unclear for job seekers. About 17% of postings are remote, showing that remote work is fairly common. Only 3.1% are hybrid, while very few (1.6%) are under “Not Remote.”